Theoretical Results on Reinforcement

نویسندگان

Doina Precup

Richard S. Sutton

Satinder Singh

چکیده

We present new theoretical results on planning within the framework of temporally abstract reinforcement learning (Precup & Sut-ton, 1997; Sutton, 1995). Temporal abstraction is a key step in any decision making system that involves planning and prediction. In temporally abstract reinforcement learning, the agent is allowed to choose among "behaviors", whole courses of action that may be temporally extended, stochastic, and contingent on previous events. Examples of behaviors include closed-loop policies such as picking up an object, as well as primitive actions such as joint torques. Knowledge about the consequences of behaviors is represented by special structures called multi-time models. In this paper we focus on the theory of planning with multi-time models. We deene new Bellman equations that are satissed for sets of multi-time models. As a consequence, multi-time models can be used interchangeably with models of primitive actions in a variety of well-known planning methods including value iteration, policy improvement and policy iteration .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experimental investigation of octagonal partially encased composite columns subject to concentric and eccentric loading

Partially Encased Composite (PEC) column is one of the recent achievements in the field of composite columns. This paper presents a combined experimental and theoretical study on the mechanical performances of six octagonal PEC columns subjected to axial compressive and bending moment loading. The major different between them was the concrete reinforcement details. The parameters studied in the...

متن کامل

Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results

Since finding control policies using Reinforcement Learning (RL) can be very time consuming, in recent years several authors have investigated how to speed up RL algorithms by making improved action selections based on heuristics. In this work we present new theoretical results – convergence and a superior limit for value estimation errors – for the class that encompasses all heuristicsbased al...

متن کامل

Issues in Using Function Approximation for Reinforcement Learning

Reinforcement learning techniques address the problem of learning to select actions in unknown, dynamic environments. It is widely acknowledged that to be of use in complex domains, reinforcement learning techniques must be combined with generalizing function approximation methods such as artificial neural networks. Little, however, is understood about the theoretical properties of such combina...

متن کامل

Distributional Reinforcement Learning with Quantile Regression

In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build ...

متن کامل

Unifying Task Specification in Reinforcement Learning

Reinforcement learning tasks are typically specified as Markov decision processes. This formalism has been highly successful, though specifications often couple the dynamics of the environment and the learning objective. This lack of modularity can complicate generalization of the task specification, as well as obfuscate connections between different task settings, such as episodic and continui...

متن کامل

Multiobjective Reinforcement Learning Using Adaptive Dynamic Programming And Reservoir Computing

This paper introduces a multiobjective reinforcement learning approach which is suitable for large state and action spaces. The approach is based on actorcritic design and reservoir computing. A single reservoir estimates several utilities simultaneously and provides their gradients that are required for the actor enabling an agent to adapt its behavior in presence of several sources of rewards...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Theoretical Results on Reinforcement

نویسندگان

چکیده

منابع مشابه

Experimental investigation of octagonal partially encased composite columns subject to concentric and eccentric loading

Heuristically Accelerated Reinforcement Learning: Theoretical and Experimental Results

Issues in Using Function Approximation for Reinforcement Learning

Distributional Reinforcement Learning with Quantile Regression

Unifying Task Specification in Reinforcement Learning

Multiobjective Reinforcement Learning Using Adaptive Dynamic Programming And Reservoir Computing

عنوان ژورنال:

اشتراک گذاری